Skip to content

feat(kernels): add TVM FFI Triton CUBIN wrapper#315

Open
Ma1oneZhang wants to merge 3 commits into
openinfer-project:mainfrom
Ma1oneZhang:feat/tvm-ffi-triton-cubin
Open

feat(kernels): add TVM FFI Triton CUBIN wrapper#315
Ma1oneZhang wants to merge 3 commits into
openinfer-project:mainfrom
Ma1oneZhang:feat/tvm-ffi-triton-cubin

Conversation

@Ma1oneZhang

Copy link
Copy Markdown
Contributor

Summary

Adds the MVP production-facing TVM FFI path for Triton AOT CUBIN launchers in pegainfer-kernels.

This makes tvm-ffi a required pegainfer-kernels dependency, exposes a packed TVM FFI global wrapper for the generated Qwen3.5 gated-delta-rule Triton launcher, and adds a small example showing registration/discovery of the wrapper ABI.

Changes

  • Add required tvm-ffi dependency to pegainfer-kernels.
  • Add pegainfer_kernels::triton_cubin with a TVM FFI packed wrapper for gated_delta_rule_prefill_chunk_solve_cuda.
  • Add the triton_cubin_tvm_ffi example for registering and inspecting the MVP wrapper.
  • Document the Triton CUBIN TVM FFI boundary and record the project-doc execution notes.

Validation

  • cargo fmt --all --check
  • cargo metadata --no-deps --format-version 1
  • cargo check --release -p pegainfer-kernels was run after adding /home/ziyang/gpu_memory_profiling/.venv/bin to PATH so tvm-ffi-config could be found. tvm-ffi built successfully, then the existing CUDA build failed in pegainfer-kernels/csrc/shared/flashinfer_top1.cu because the local dirty FlashInfer submodule exposes a different TopKDispatch API. The submodule change is not part of this PR.

Related

Related to #191.

This is intentionally narrower than #202: #202 validates optional bidirectional interop, while this PR makes tvm-ffi required for pegainfer-kernels and wires the first Triton CUBIN wrapper surface.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces tvm-ffi as a required dependency for pegainfer-kernels and implements a TVM FFI wrapper layer for Triton AOT CUBIN launchers, specifically exposing the Qwen3.5 gated-delta-rule triangular solve launcher. It also adds corresponding documentation, tests, and an example. Feedback on the implementation highlights a critical runtime issue in triton_cubin.rs where integer arguments are extracted directly as u64 or i32 using AnyView::try_as. Because TVM FFI packed integer arguments are always passed as i64, these strict type-matching checks will fail at runtime. The reviewer recommends extracting them as i64 first before casting.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +106 to +117
fn arg_handle(args: &[AnyView<'_>], spec: TritonCubinFunctionSpec, idx: usize) -> TvmResult<usize> {
let value = args
.get(idx)
.ok_or_else(|| type_error(spec, idx, "an integer or opaque pointer"))?;
if let Some(raw) = value.try_as::<u64>() {
return Ok(raw as usize);
}
if let Some(raw) = value.try_as::<*mut c_void>() {
return Ok(raw as usize);
}
Err(type_error(spec, idx, "an integer or opaque pointer"))
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In TVM FFI, packed function arguments of integer type are always passed as signed 64-bit integers (i64). Since AnyView::try_as performs strict type-code matching, attempting to extract them using try_as::<u64>() will fail at runtime with a type error. To ensure robustness, we should attempt to extract the value as i64 first before trying u64 or *mut c_void.

fn arg_handle(args: &[AnyView<'_>], spec: TritonCubinFunctionSpec, idx: usize) -> TvmResult<usize> {
    let value = args
        .get(idx)
        .ok_or_else(|| type_error(spec, idx, "an integer or opaque pointer"))?;
    if let Some(raw) = value.try_as::<i64>() {
        return Ok(raw as usize);
    }
    if let Some(raw) = value.try_as::<u64>() {
        return Ok(raw as usize);
    }
    if let Some(raw) = value.try_as::<*mut c_void>() {
        return Ok(raw as usize);
    }
    Err(type_error(spec, idx, "an integer or opaque pointer"))
}

Comment on lines +119 to +123
fn arg_i32(args: &[AnyView<'_>], spec: TritonCubinFunctionSpec, idx: usize) -> TvmResult<i32> {
args.get(idx)
.and_then(AnyView::try_as::<i32>)
.ok_or_else(|| type_error(spec, idx, "an integer"))
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to arg_handle, scalar integer arguments passed from Python/TVM are represented as i64 in the TVM FFI packed call. Calling try_as::<i32>() directly will fail because of strict type-code matching. We should extract the value as i64 first and then cast it to i32.

fn arg_i32(args: &[AnyView<'_>], spec: TritonCubinFunctionSpec, idx: usize) -> TvmResult<i32> {
    let value = args
        .get(idx)
        .ok_or_else(|| type_error(spec, idx, "an integer"))?;
    if let Some(raw) = value.try_as::<i64>() {
        return Ok(raw as i32);
    }
    if let Some(raw) = value.try_as::<i32>() {
        return Ok(raw);
    }
    Err(type_error(spec, idx, "an integer"))
}

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eed6eb7505

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread pegainfer-kernels/src/triton_cubin.rs Outdated
Comment on lines +119 to +122
fn arg_i32(args: &[AnyView<'_>], spec: TritonCubinFunctionSpec, idx: usize) -> TvmResult<i32> {
args.get(idx)
.and_then(AnyView::try_as::<i32>)
.ok_or_else(|| type_error(spec, idx, "an integer"))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Accept TVM i64 integer arguments

This path rejects the normal TVM integer representation for seq_len and num_value_heads: AnyView::try_as::<i32>() is an exact type check rather than a narrowing conversion, while TVM FFI integer values commonly arrive as i64 (including from Python/TVM typed calls). In that environment the new global function returns a type error before launching the CUBIN even for valid dimensions; parse i64/u64 and range-check before casting to i32.

Useful? React with 👍 / 👎.

@Ma1oneZhang

Copy link
Copy Markdown
Contributor Author

Added test coverage in 0c55194 for the TVM FFI Triton CUBIN wrapper:

  • known/unknown wrapper lookup
  • TVM FFI global registry round-trip
  • accepted raw handle encodings (u64 and opaque pointer)
  • missing-argument diagnostics before launch
  • handle/scalar type diagnostics before launch

Validation:

  • cargo fmt --all --check passed
  • PATH=/home/ziyang/gpu_memory_profiling/.venv/bin:$PATH cargo test --release -p pegainfer-kernels triton_cubin --lib built tvm-ffi, then failed before Rust test execution in the existing CUDA build at pegainfer-kernels/csrc/shared/flashinfer_top1.cu because the local dirty FlashInfer submodule has a TopKDispatch API mismatch. That submodule change remains outside this PR.

@xiaguan xiaguan left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch. I like the direction of exposing these CUBIN launchers through TVM FFI, but I do not think tvm-ffi should become a required dependency of pegainfer-kernels yet.

Please put this behind a feature gate, or move it into a small optional bridge crate like pegainfer-kernels-tvm, so normal kernel builds do not require tvm-ffi-config / libtvm_ffi.

Also, the added tests currently do not compile locally:

PATH=/data/code/workspace-rustllm/pegainfer/.venv/bin:$PATH PEGAINFER_CUDA_SM=120 PEGAINFER_TRITON_PYTHON=/data/code/workspace-rustllm/pegainfer/.venv/bin/python cargo test --release -p pegainfer-kernels triton_cubin --lib -- --nocapture

fails with:

error[E0277]: tvm_ffi::Any doesn't implement Debug

This comes from expect_err(...) on call_packed(...); expect_err requires the success type to implement Debug. Please fix the tests, e.g. by matching on the Result.

Requesting changes until these are addressed.

@Ma1oneZhang

Copy link
Copy Markdown
Contributor Author

Addressed xiaguan's requested changes in 0228109:

  • Made tvm-ffi optional behind pegainfer-kernels/tvm-ffi-triton-cubin; normal pegainfer-kernels builds no longer require tvm-ffi-config or libtvm_ffi.
  • Gated pegainfer_kernels::triton_cubin with that feature and marked the example with required-features.
  • Replaced test expect_err(...) calls with explicit Result matching so tvm_ffi::Any: Debug is not required.
  • Also fixed the packed integer handling noted by automated inline reviews: pointer handles and scalar launch dimensions now accept TVM FFI i64 integers with range checks before casting.

Validation:

  • cargo fmt --all --check passed
  • cargo metadata --no-deps --format-version 1 passed
  • cargo tree -p pegainfer-kernels -e normal --no-default-features --depth 1 shows no tvm-ffi
  • cargo tree -p pegainfer-kernels -e normal --features tvm-ffi-triton-cubin --depth 1 shows tvm-ffi only with the bridge feature
  • cargo check --release -p pegainfer-kernels no longer needs tvm-ffi-config, then still stops at the pre-existing dirty FlashInfer flashinfer_top1.cu / TopKDispatch mismatch in this checkout
  • PATH=/home/ziyang/gpu_memory_profiling/.venv/bin:$PATH cargo test --release -p pegainfer-kernels --features tvm-ffi-triton-cubin triton_cubin --lib -- --nocapture stops at the same CUDA build-script mismatch before Rust tests run locally

@xiaguan

xiaguan commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Chef's kiss.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@xiaguan

xiaguan commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

rebase or merge master?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants